Celestin Apprentice 2

home *** CD-ROM | disk | FTP | other *** search

/ Celestin Apprentice 2 / Apprentice-Release2.iso / Information / Digests / CSMP Digest / volume 1 / csmp-v1-169.txt < prev next >

Wrap

Text File | 1994-12-08 | 49.5 KB | 1,274 lines | [TEXT/R*ch]

C.S.M.P. Digest Tue, 18 Aug 92 Volume 1 : Issue 169 Today's Topics: Help! Stack overwriting heap! Microcode (was Re: Does the 68000 have a "block move" instruction) List Manager help sought The Comp.Sys.Mac.Programmer Digest is moderated by Michael A. Kelly. The digest is a collection of article threads from the internet newsgroup comp.sys.mac.programmer. It is designed for people who read c.s.m.p. semi- regularly and want an archive of the discussions. If you don't know what a newsgroup is, you probably don't have access to it. Ask your systems administrator(s) for details. (This means you can't post questions to the digest.) Each issue of the digest contains one or more sets of articles (called threads), with each set corresponding to a 'discussion' of a particular subject. The articles are not edited; all articles included in this digest are in their original posted form (as received by our news server at cs.uoregon.edu). Article threads are not added to the digest until the last article added to the thread is at least one month old (this is to ensure that the thread is dead before adding it to the digest). Article threads that consist of only one message are generally not included in the digest. The entire digest is available for anonymous ftp from ftp.cs.uoregon.edu [128.223.8.8] in the directory /pub/mac/csmp-digest. Be sure to read the file /pub/mac/csmp-digest/README before downloading any files. The most recent issues are available from sumex-aim.stanford.edu [36.44.0.6] in the directory /info-mac/digest/csmp. If you don't have ftp capability, the sumex archive has a mail server; send a message with the text '$MACarch help' (no quotes) to LISTSERV@ricevm1.rice.edu for more information. The digest is also available via email. Just send a note saying that you want to be on the digest mailing list to mkelly@cs.uoregon.edu, and you will automatically receive each new issue as it is created. Sorry, back issues are not available through the mailing list. Send administrative mail to mkelly@cs.uoregon.edu. ------------------------------------------------------- From: u2005681@ucsvc.ucs.unimelb.edu.au Subject: Help! Stack overwriting heap! Date: 15 Jul 92 10:45:32 GMT Organization: The University of Melbourne Hi people, I am manipulating a large array (six dimensional) of structs (containing 8 float numbers) using THINK C 5.0.2. The program runs fine on our SGI workstation but on the Mac I get a message that the stack is overwriting the heap (this has an interesting effect on the stability of the MacOS). What can I do and how do I increase the stack size? Thank you in advance Chris Hofflin +++++++++++++++++++++++++++ From: mlanett@Apple.COM (Mark Lanett) Date: 15 Jul 92 11:57:03 GMT Organization: Apple Computer Inc., Cupertino, CA u2005681@ucsvc.ucs.unimelb.edu.au writes: >Hi people, >I am manipulating a large array (six dimensional) of structs (containing >8 float numbers) using THINK C 5.0.2. The program runs fine on our SGI >workstation but on the Mac I get a message that the stack is overwriting >the heap (this has an interesting effect on the stability of the MacOS). >What can I do and how do I increase the stack size? Unless your matrix is very small, I doubt it'll fit on the stack. So my guess is that it's on the heap, and so has nothing whatsoever to do with a stack overflow. The best way to cause a stack overflow is just to make a programming error that causes an endless recursion somewhere (for example, leaving off inherited:: in C++. But I digress). Why this would occur on the Mac and not SGI could be due to any number of reasons. Now, if your matrix really is on the stack, I could see problems. But you say the *same* program is running on an SGI and Mac, which makes me think you've got it under MPW. And MPW gives lots (at least 128K) of stack space to its tools. Well anyway, you haven't said enough for me to not make these assumptions (wild guesses, whatever). How about a little more background? - -- Have a bajillion brilliant Jobsian lithium licks. Mark Lanett, NOT speaking for apple. Personal opinion only. +++++++++++++++++++++++++++ From: phils@chaos.cs.brandeis.edu (Phil Shapiro) Date: 15 Jul 92 14:39:21 GMT Organization: Symantec Corp. In article <70049@apple.Apple.COM> mlanett@Apple.COM (Mark Lanett) writes: u2005681@ucsvc.ucs.unimelb.edu.au writes: >Hi people, >I am manipulating a large array (six dimensional) of structs (containing >8 float numbers) using THINK C 5.0.2. The program runs fine on our SGI >workstation but on the Mac I get a message that the stack is overwriting >the heap (this has an interesting effect on the stability of the MacOS). >What can I do and how do I increase the stack size? Unless your matrix is very small, I doubt it'll fit on the stack. So my guess is that it's on the heap, and so has nothing whatsoever to do with a stack overflow. Actually, THINK C 5.0 supports automatics of virtually any size. If you like, you can declare local variables that are, say, 128K in size. However, you're limited to the default stack size on your Mac. On a machine with Color Quickdraw, you'll get a 24K stack, on all others you'll get an 8K stack. If you want to grow your stack, you should use a routine like: void GrowStack(long needed) // needed is how much space we want { SetApplLimit(GetApplLimit() - (needed - StackSpace())); MaxApplZone(); } You must call this routine at the start of your program, before using any of the standard I/O routines. But you say the *same* program is running on an SGI and Mac, which makes me think you've got it under MPW. Now what's *that* supposed to mean ?? -phil - -- Phil Shapiro Software Engineer Language Products Group Symantec Corporation Internet: phils@cs.brandeis.edu +++++++++++++++++++++++++++ From: mlanett@Apple.COM (Mark Lanett) Date: 15 Jul 92 14:53:03 GMT Organization: Apple Computer Inc., Cupertino, CA phils@chaos.cs.brandeis.edu (Phil Shapiro) writes: >In article <70049@apple.Apple.COM> mlanett@Apple.COM (Mark Lanett (me)) writes: > u2005681@ucsvc.ucs.unimelb.edu.au writes: > >Hi people, > >I am manipulating a large array (six dimensional) of structs (containing > >8 float numbers) using THINK C 5.0.2. The program runs fine on our SGI... > But you say the *same* program is running on an SGI and Mac, which > makes me think you've got it under MPW. >Now what's *that* supposed to mean ?? Never mind. End of the day non-thinkingness. Does TC have a stdio mode? Yes, I know. Hmmm, does it have a curses lib? Yes, we do have it site-licensed, I just haven't used it since I started programming the Mac, when it was LightSpeed 2.0. - -- Have a bajillion brilliant Jobsian lithium licks. Mark Lanett, NOT speaking for apple. Personal opinion only. --------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: Microcode (was Re: Does the 68000 have a "block move" instruction) Organization: Kalamazoo College Date: Tue, 23 Jun 1992 12:36:39 GMT neeri@iis.ethz.ch (Matthias Neeracher) writes: >cramer@unixland.natick.ma.us (Bill Cramer) writes: >> >>As someone else >>noted, on a 68020 or later processor, these instructions stay in >>the instruction cache, so it gives you the same (or better) speed >>as a block move instruction without the microcode penalty. > >While it is true that you would have to pay a performance penalty by having a >blockmove instruction, I'd like to point out that, as far as I know, the 680X0 >family *is* microcoded, and the instruction cache doesn't cache microcode. I think what Bill meant by "microcode penalty" was that adding "funny instructions" like a blockmove adds microcode to the processor, and in general, the more transistors you have, the slower the chip. I wouldn't think there are any modern-day processors that don't use microcode, are there? - -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy Never piss off a computer. +++++++++++++++++++++++++++ From: palmer@cco.caltech.edu (David Palmer) Organization: California Institute of Technology, Pasadena Date: Tue, 23 Jun 1992 13:17:16 GMT k044477@hobbes.kzoo.edu (Jamie R. McCarthy) writes: >neeri@iis.ethz.ch (Matthias Neeracher) writes: >>cramer@unixland.natick.ma.us (Bill Cramer) writes: >>> >>>As someone else >>>noted, on a 68020 or later processor, these instructions stay in >>>the instruction cache, so it gives you the same (or better) speed >>>as a block move instruction without the microcode penalty. >> >>While it is true that you would have to pay a performance penalty by having a >>blockmove instruction, I'd like to point out that, as far as I know, the 680X0 >>family *is* microcoded, and the instruction cache doesn't cache microcode. >I think what Bill meant by "microcode penalty" was that adding "funny >instructions" like a blockmove adds microcode to the processor, and in >general, the more transistors you have, the slower the chip. >I wouldn't think there are any modern-day processors that don't use >microcode, are there? I think that RISC processors might not have microcode as such. (A single clock cycle isn't enough time to execute many microcode instructions.) On a historical note, the original VAX 780 (the Complex Instruction Set Computer that convinced people that RISC was the way to go :-) had an instruction (I think it was a string compare) which, in the first release of the microcode, was actually slower than a short assembly language routine which did not use that instruction. (I learned about this back in ~1983 so it is not apocryphal.) - -- David Palmer palmer@tgrs.gsfc.nasa.gov I am now at Goddard Space Flight Center/NASA, for whom I do not speak. +++++++++++++++++++++++++++ From: stu5s11@bcrka280.bnr.ca Date: 23 Jun 92 21:21:01 GMT Organization: Bell-Northern Research, Ottawa, Canada >I think what Bill meant by "microcode penalty" was that adding "funny >instructions" like a blockmove adds microcode to the processor, and in >general, the more transistors you have, the slower the chip. > >I wouldn't think there are any modern-day processors that don't use >microcode, are there? Based on what I've heard microcode, is not only on it's way out, it's long gone. I believe that the 68040 uses microcode only in limited amounts, or not at all. The problem is that microcoding adds another layer of depth, causing additional delays. This means you can't crank up the speed of the processor clock. This probably has something to do with the fact that 68000's only go up to 16 Mhz, while 68030's are availible up to 50Mhz. Certainly no one is using nanocode, like the original 68000 does. This is the reason for the popularity of RISC. Getting rid of Complex instructions allowed you to get rid of microcode, and hardwire everything instead. With out the extra delays, you could crank up the processor speed. Even if emulating a complex instruction in software took 20% longer, you made it up with a 30%+ increase in clock speed. +++++++++++++++++++++++++++ From: Bruce.Hoult@bbs.actrix.gen.nz Organization: Actrix Information Exchange Date: Wed, 24 Jun 1992 08:16:08 GMT In article <1992Jun23.123639.21474@hobbes.kzoo.edu> k044477@hobbes.kzoo.edu (Jamie R. McCarthy) writes: > I wouldn't think there are any modern-day processors that don't use > microcode, are there? Every RISC processor, perhaps? :-) Even the 68040 in my Quadra 700 can't reply much on microcode, since in a couple of tests I've made it's managed around 20 MIPS from a 25 MHz clock. (If there was a 50 MHz or 100 MHz version of the '040 it would be a seriously respectable chip. OTOH, no doubt the CISC nature of the instruction set is exactly *why* there isn't a 100 MHz version :-( ) - -- Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 477 2116 BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ "Cray's producing a 200 MIPS personal computer with 64MB RAM and a 1 GB hard disk that fits in your pocket!" "Great! Is it PC compatable?" +++++++++++++++++++++++++++ From: Bruce.Hoult@bbs.actrix.gen.nz Date: Wed, 24 Jun 1992 08:46:50 GMT Organization: Actrix Information Exchange In article <1992Jun23.131716.25812@cco.caltech.edu> palmer@cco.caltech.edu (David Palmer) writes: > On a historical note, the original VAX 780 (the Complex Instruction > Set Computer that convinced people that RISC was the way to go :-) > had an instruction (I think it was a string compare) which, in > the first release of the microcode, was actually slower than > a short assembly language routine which did not use that instruction. > (I learned about this back in ~1983 so it is not apocryphal.) I think there were many instruction on the 11/780 for which that was true. And it's not just the VAX -- the same thing happens on the 68000. Just look at MOVEM.W (An)+,[some registers] with a timing of 12 + 4n cycles (where n is the number of registers loaded). A simple MOVE.W (An)+,Rn takes 8 cycles. That means that for one or two registers it's faster to use the cimple instructions, and for three it's a tie. Or look at the shift instructions. ASL.W #count,Dn takes 6 + 2n cycles and ADD.W Dn,Dn takes 4 cycles. That means that... ADD.W D2,D2 ADD.W D2,D2 ... is faster than ... ASL.W #2,D2 The same thing applys to the multiply instruction. Suppose you want to multiply an unsigned integer by 100 (decimal) which is 1100100 in binary. You can use... #operand is in D2 ADD.W D2,D2 ADD.W D2,D2 MOVE.W D2,D3 ADD.W D2,D2 ADD.W D2,D2 ADD.W D2,D3 ADD.W D2,D2 ADD.W D3,D2 #operand*100 is in D2 ...which is 7 ADD.W's @4 cycles each plus a MOVE.W @4 cycles for a total of 32 cycles, instead of MULU.W #100,D2 ... at 38 + 2n (n is number of 1 bits = 3) + 4 (for the immediate operand) for a total of 48 clocks. The eight instruction sequence is 50% faster than the multiply instruction! (All timings are for the original 68000) - -- Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 477 2116 BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ "Cray's producing a 200 MIPS personal computer with 64MB RAM and a 1 GB hard disk that fits in your pocket!" "Great! Is it PC compatable?" +++++++++++++++++++++++++++ From: stu5s11@bcrka280.bnr.ca Date: 24 Jun 92 16:50:20 GMT Organization: Bell-Northern Research, Ottawa, Canada >Every RISC processor, perhaps? :-) Even the 68040 in my Quadra 700 >can't reply much on microcode, since in a couple of tests I've made >it's managed around 20 MIPS from a 25 MHz clock. (If there was a 50 >MHz or 100 MHz version of the '040 it would be a seriously respectable >chip. OTOH, no doubt the CISC nature of the instruction set is >exactly *why* there isn't a 100 MHz version :-( ) > >-- >Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 477 2116 >BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ >"Cray's producing a 200 MIPS personal computer with 64MB RAM and a 1 GB >hard disk that fits in your pocket!" "Great! Is it PC compatable?" I'd just point out that that the 68040 has two frequencies. The one that is always mentioned is the Bus Clock (BCLK) which run at either 25 Mhz or 33 Mhz. There is another clock called the Processor clock (PCLK) that runs at double the frequency of the BCLK, ie. 50 Mhz or 66 Mhz. I'm just pointing this out to show why 68040 speeds seem so low compared to 030 speeds. (030s go up to 50Mhz, yet 040s run only at 25 Mhz. In reality, parts of the 040 are really running at 50 Mhz.) There really isn't that much potential for a easy 50 Mhz (100 Mhz) 68040. Which isn't to say there won't be one, just that it will pushing the boundaries of IC technology, and not just a tune up. - ------------------------------------------------------------- John Andrusiak +++++++++++++++++++++++++++ From: peter@cujo.curtin.edu.au (Peter N Lewis) Organization: Curtin University of Technology Date: Thu, 25 Jun 1992 08:48:28 GMT Bruce.Hoult@bbs.actrix.gen.nz writes: >The same thing applys to the multiply instruction. Suppose you want to >multiply an unsigned integer by 100 (decimal) which is 1100100 in binary. >You can use... >#operand is in D2 >ADD.W D2,D2 >ADD.W D2,D2 >MOVE.W D2,D3 >ADD.W D2,D2 >ADD.W D2,D2 >ADD.W D2,D3 >ADD.W D2,D2 >ADD.W D3,D2 >#operand*100 is in D2 >...which is 7 ADD.W's @4 cycles each plus a MOVE.W @4 cycles for a >total of 32 cycles, instead of >MULU.W #100,D2 >... at 38 + 2n (n is number of 1 bits = 3) + 4 (for the immediate >operand) for a total of 48 clocks. >The eight instruction sequence is 50% faster than the multiply instruction! Ahh yes, but only one of them calculates D2 * 100 :-) Now if we were interested in D2 * 52, well your logic might work quite well then :-) I think I'll stick with x:=100*y myself :-) And anyway, the Z80 is better than the 6502 :-) Peter. +++++++++++++++++++++++++++ From: peter@cujo.curtin.edu.au (Peter N Lewis) Date: 25 Jun 92 08:42:46 GMT Organization: NCRPDA, Curtin University In article <1992Jun24.084650.1693@actrix.gen.nz>, Bruce.Hoult@bbs.actrix.gen.nz wrote: > The same thing applys to the multiply instruction. Suppose you want to > multiply an unsigned integer by 100 (decimal) which is 1100100 in binary. > You can use... > > #operand is in D2 > ADD.W D2,D2 > ADD.W D2,D2 > MOVE.W D2,D3 > ADD.W D2,D2 > ADD.W D2,D2 > ADD.W D2,D3 > ADD.W D2,D2 > ADD.W D3,D2 > #operand*100 is in D2 > > ...which is 7 ADD.W's @4 cycles each plus a MOVE.W @4 cycles for a > total of 32 cycles, instead of > > MULU.W #100,D2 > > ... at 38 + 2n (n is number of 1 bits = 3) + 4 (for the immediate > operand) for a total of 48 clocks. > > The eight instruction sequence is 50% faster than the multiply instruction! Ahh yes, but only one of them calculates D2 * 100 :-) Now if we were interested in D2 * 52, well your logic might work quite well then :-) And anyway, the Z80 is better than the 6502 :-) Peter. +++++++++++++++++++++++++++ From: jackb@mdd.comm.mot.com (Jack Brindle) Date: 25 Jun 92 16:46:06 GMT Organization: Motorola, Mobile Data Division - Seattle, WA In article <1992Jun25.084828.22882@cujo.curtin.edu.au> peter@cujo.curtin.edu.au (Peter N Lewis) writes: > >And anyway, the Z80 is better than the 6502 :-) Gee, I thought this was a Mac forum. But since you brought it up, the 6502 at the same clock rate bench marks quite a bit faster than the Z80. A 4MHz 6502 will simply blow away a 4 MHz Z80 (or even an 8 MHz Z80 for that matter). The 6502 continues to be the most widely produced 8 bit processor in the world (for about the 14th year in a row). It is very widely used in single chip versions in all sort of products; the Mitsubishi versions are used in many Japanese-made appliances like microwave ovens and such. Amazing that after 17 years of existance it still holds a commanding lead... Now maybe we can get back to Mac issues. Actually the pertinence of this topic is the fact that the IIfx contains at least one, if not two 6502 microcells in its logic, and the fact that the IWM and SWIM are basically hardware implementations of Apple's 6502 based disk controller. It's interesting to see the Lisa / Mac-XL's floppy controller - it is a 6504 implementation of the basic Apple II disk subsystem! Jack Brindle ham radio: wa4fib internet: jackb@mdd.comm.mot.com +++++++++++++++++++++++++++ From: Bruce.Hoult@bbs.actrix.gen.nz Date: Fri, 26 Jun 1992 02:27:35 GMT Organization: Actrix Information Exchange In article <1992Jun25.084828.22882@cujo.curtin.edu.au> peter@cujo.curtin.edu.au (Peter N Lewis) writes: > >The eight instruction sequence is 50% faster than the multiply instruction! > > Ahh yes, but only one of them calculates D2 * 100 :-) Now if we were > interested in D2 * 52, well your logic might work quite well then :-) > I think I'll stick with x:=100*y myself :-) Uh, yeah, I noticed that a while after I posted it. I was wondering whether anyone would read closely enough to notice :-) Just include an extra ADD.W (or change the sequence of (now) three ADD.W's to an ASL.W #3 which takes the same amount of time) and change the "50% faster" to "33% faster". > And anyway, the Z80 is better than the 6502 :-) I once wrote a few algorithms in an optimal manner (or as close as I could get :-) for both the 6502 and Z80. The Z80 required about three times as many clock cycles as the 6502 (and incidentally was much harder to find the optimal sequence for) which meant that the then common 4 MHz Z80's were just a little faster than the 1 MHz 6502. Now you can get 10 MHz 6502's. Where are the 30 MHz Z80's? :-) - -- Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 477 2116 BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ "Cray's producing a 200 MIPS personal computer with 64MB RAM and a 1 GB hard disk that fits in your pocket!" "Great! Is it PC compatable?" +++++++++++++++++++++++++++ From: d88-jwa@dront.nada.kth.se (Jon W{tte) Date: 28 Jun 92 23:42:37 GMT Organization: Royal Institute of Technology, Stockholm, Sweden .kzoo.edu> k044477@hobbes.kzoo.edu (Jamie R. McCarthy) writes: >While it is true that you would have to pay a performance penalty by having a >blockmove instruction, I'd like to point out that, as far as I know, the 680X0 >family *is* microcoded, and the instruction cache doesn't cache microcode. Why would it; access time to microcode is (pretty much) instaneous - at least fast enough for 0 wait cycles which eliminates the need for microcoding. Usually microcode is hardwired in the logic, even. instructions" like a blockmove adds microcode to the processor, and in general, the more transistors you have, the slower the chip. Not exactly true. If you have more transistors in SERIES in the signal path, you get slower chips, However, parallell transistors like those for caches, or for registers, don't slow things down, quite the opposite ! I wouldn't think there are any modern-day processors that don't use microcode, are there? Yes, basically that's what a CISC chip is, while RISC chips usually do not rely on microcode. However, how do we classify the 68040 which executes all basic instructions (add, compare, move registers, ...) in one cycle, and store instructions in two, just like a RISC ? It also has a sexy pipeline... Trivia: For the Power chip, a branch (conditional) takes the whopping amount of 0 cycles to execute (!) - -- Jon W{tte, Svartmangatan 18, S-111 29 Stockholm, Sweden "Difficult, obscure, incoherent and nonstandard does not imply more power." - Andrew Kass in comp.sys.mac.hardware +++++++++++++++++++++++++++ From: d88-jwa@dront.nada.kth.se (Jon W{tte) Date: 28 Jun 92 23:47:33 GMT Organization: Royal Institute of Technology, Stockholm, Sweden .comm.mot.com> jackb@mdd.comm.mot.com (Jack Brindle) writes: Now maybe we can get back to Mac issues. Actually the pertinence of this topic is the fact that the IIfx contains at least one, if not two 6502 microcells in its logic, and the fact that the IWM and SWIM are Two, as well as the Quadra 900 and 950 (but not 700) - -- Jon W{tte, Svartmangatan 18, S-111 29 Stockholm, Sweden "Difficult, obscure, incoherent and nonstandard does not imply more power." - Andrew Kass in comp.sys.mac.hardware +++++++++++++++++++++++++++ From: brad@titan.austin.ibm.com (Brad Garton) Date: 30 Jun 92 23:42:53 GMT Organization: IBM Advanced Workstation Division > >While it is true that you would have to pay a performance penalty by having a > >blockmove instruction, I'd like to point out that, as far as I know, the 680X0 > >family *is* microcoded, and the instruction cache doesn't cache microcode. >(Jon W writes ) >Why would it; access time to microcode is (pretty much) >instaneous - at least fast enough for 0 wait cycles which >eliminates the need for microcoding. Usually microcode is >hardwired in the logic, even. It IS microcoded and microcode access is NOT instantaneous. Microcode is more like a ROM that drives control signals for the internal logic of the processor, the longer the microcode the longer it takes to decode addresses within the microcode ROM and potentially the longer it takes to run ALL instructions. This instruction decode time for processors with complex instruction sets is one of the reasons why we have RISC processors. Also instructions like this tend to break pipelines. Maybe someone from Motorola can help out on why this is true for the 680x0 family. >> instructions" like a blockmove adds microcode to the processor, and in >> general, the more transistors you have, the slower the chip. >Not exactly true. If you have more transistors in SERIES in the >signal path, you get slower chips, However, parallell transistors >like those for caches, or for registers, don't slow things down, >quite the opposite ! Jamie's argument is a good one though. Also this is an optimization problem you are balancing cost vs function vs reliability. If you trade off a lot of transistors for this function you don't have them to do equally valuable things OR you wind up with a chip die thats an inch square and costs 3K. and the yield is .001 % >Yes, basically that's what a CISC chip is, while RISC chips >usually do not rely on microcode. However, how do we classify the ^^^^^^^^^^^^^^^^^ >68040 which executes all basic instructions (add, compare, move >registers, ...) in one cycle, and store instructions in two, just >like a RISC ? It also has a sexy pipeline... As a very competitive processor. >Trivia: For the Power chip, a branch (conditional) takes the >whopping amount of 0 cycles to execute (!) I wish this were true, it is usuallly, but not always. I'm nit picking. It is still pretty good at guessing the right path. Brad ___________________________________________________________ | Brad Garton (512) 838-1333 | brad@titan.austin.ibm.com | | I speak for myself not IBM | VNET: GARTON AT AUSTIN | +---------------------------------------------------------+ +++++++++++++++++++++++++++ From: d88-jwa@dront.nada.kth.se (Jon W{tte) Organization: Royal Institute of Technology, Stockholm, Sweden Date: Wed, 1 Jul 1992 10:35:07 GMT .ibm.com> brad@titan.austin.ibm.com (Brad Garton) writes: [someone writes] > >family *is* microcoded, and the instruction cache doesn't cache microcode. >(Jon W writes ) >Why would it; access time to microcode is (pretty much) >instaneous - at least fast enough for 0 wait cycles which It IS microcoded and microcode access is NOT instantaneous. Microcode is more like a ROM that drives control signals for the internal logic of Microcode ACCESS _IS_ instaneous (enough) so that it would NOT be necessary to cache the microcode. However, microcode EXECUTION, of course, takes time, but that's another kettle of yellow sharks entirely. run ALL instructions. This instruction decode time for processors with complex instruction sets is one of the reasons why we have RISC processors. RISC throws away microcode entirely. More microcode -> larger microcode addresses and storage -> more transistors needed, but not necessarily slower execution since address buses to microcode (not to be confised with the external address bus) are parallell. Or do you sincerely believe that an 8 bit bus would be faster ? :-) Also instructions like this tend to break pipelines. Maybe someone from Motorola can help out on why this is true for the 680x0 family. As soon as instructions take more than one cycle before the next instruction start/execute may take place, it leaves bubbles in the pipeline. That happens on RISC too, with load and store instructions. (An instruction on an MC68040 actually takes 6 cycles from start to finish, but since the pipe is six stages, the "normal" instruction execution time is - 1 cycle !) >Trivia: For the Power chip, a branch (conditional) takes the >whopping amount of 0 cycles to execute (!) I wish this were true, it is usuallly, but not always. I'm nit picking. It is still pretty good at guessing the right path. Yes, and it requires the condition code it needs to be available a few cycles beforehand, else we get pipeline problems. However, in a typical non-trivial for or while loop, we actually get a 0-cycle branch. Nice touch with the 8 "CCR"-s too, to avoid that bottleneck... - -- Jon W{tte, Svartmangatan 18, S-111 29 Stockholm, Sweden "Difficult, obscure, incoherent and nonstandard does not imply more power." - Andrew Kass in comp.sys.mac.hardware +++++++++++++++++++++++++++ From: paul@taniwha.UUCP (Paul Campbell) Date: 30 Jun 92 15:42:58 GMT Organization: Taniwha Systems Design >Bruce.Hoult@bbs.actrix.gen.nz writes: >The same thing applys to the multiply instruction. Suppose you want to >multiply an unsigned integer by 100 (decimal) which is 1100100 in binary. >You can use... >#operand is in D2 >ADD.W D2,D2 >ADD.W D2,D2 >MOVE.W D2,D3 >ADD.W D2,D2 >ADD.W D2,D2 >ADD.W D2,D3 >ADD.W D2,D2 >ADD.W D3,D2 >#operand*100 is in D2 >...which is 7 ADD.W's @4 cycles each plus a MOVE.W @4 cycles for a >total of 32 cycles, instead of >MULU.W #100,D2 >... at 38 + 2n (n is number of 1 bits = 3) + 4 (for the immediate >operand) for a total of 48 clocks. >The eight instruction sequence is 50% faster than the multiply instruction! But it takes 16 bytes of instructions compared with 4 bytes, the difference is about a whole cache line, the extra time it takes to read this is about ~60+3*40+100 (assuming worst case RAS precharge delays) = 280nS which on a system with a 50MHz cpu core clock (ie a 50MHz '030 or 25MHz '040) is 14 clocks which makes total execution time for the first case to be 46 clocks, if it's not in the cache already (ie it's not in an inner loop) - still faster, but maybe not worth the effort. It also depends on which CPU you use, my '020 manual gives best case times for mul.w as 25 clocks best case/28 worst case, an '000 gives 70 clocks max, an '010 40 clocks, I can't find my '030/'040 manuals but you get the idea .... Genericly on CPUs with a full barrel shifter asr/asl complete is a clock or two, on a CPU with a hardware multiplier (rather than a shift-and-add engine) the multiply will run in a couple of clocks. The point I'm trying to make here is that this stuff isn't at all cut and dried, if you target a particular CPU in a particular memory sub-system you can make this trade off - your compiler probably can't for cases like the one above but probably can for more obvious cases like 3*x and 10*x and 15*x because it's targetted at the whole range of Macs - from 8MHz '000s up to 25/50MHz '040s, I'm pretty sure that when you tell MPW to make '020 code it doesn't know to tune like this, instead it just targets instructions that are not available on the 16-bit CPUs Paul Campbell SuperMac - -- Paul Campbell UUCP: ..!mtxinu!taniwha!paul AppleLink: CAMPBELL.P "'Potato', not 'Potatoe'" Bart Simpson - on the blackboard 6/25/92 +++++++++++++++++++++++++++ From: Bruce.Hoult@bbs.actrix.gen.nz Date: 3 Jul 92 08:51:59 GMT Organization: Actrix Information Exchange In article <1138@taniwha.UUCP> paul@taniwha.UUCP (Paul Campbell) writes: > >Bruce.Hoult@bbs.actrix.gen.nz writes: > > >The same thing applys to the multiply instruction. Suppose you want to > >multiply an unsigned integer by 100 (decimal) which is 1100100 in binary. > >You can use... [my code that accidently multiplies by 52 instead of 100 deleted :-) (I noticed 10 mins after I posted it and one person emailed me about it later, btw] > But it takes 16 bytes of instructions compared with 4 bytes, the difference > is about a whole cache line, the extra time it takes to read this is about > ~60+3*40+100 (assuming worst case RAS precharge delays) = 280nS which on a > system with a 50MHz cpu core clock (ie a 50MHz '030 or 25MHz '040) is 14 clocks > which makes total execution time for the first case to be 46 clocks, if > it's not in the cache already (ie it's not in an inner loop) - still faster, > but maybe not worth the effort. Uh, those numbers are explicitly 68000 timings, *not* anything with a cache. > It also depends on which CPU you use, my '020 manual gives best case times > for mul.w as 25 clocks best case/28 worst case, an '000 gives 70 clocks > max, an '010 40 clocks, I can't find my '030/'040 manuals but you get the > idea .... > > Genericly on CPUs with a full barrel shifter asr/asl complete is a clock or > two, on a CPU with a hardware multiplier (rather than a shift-and-add engine) > the multiply will run in a couple of clocks. The point is that I find it interesting that on a machine with a shift-and-add microcode engine for multiply you can often beat it with macrocode. > The point I'm trying to make here is that this stuff isn't at all cut and dried, > if you target a particular CPU in a particular memory sub-system you can > make this trade off - your compiler probably can't for cases like the one above > but probably can for more obvious cases like 3*x and 10*x and 15*x because it's > targetted at the whole range of Macs - from 8MHz '000s up to 25/50MHz '040s, I agree entirely. I'm not suggesting this as a normal technique except in the most critical situations, and even there *test* it too see if it really is faster. There can be quite an advantage on the 68000. I haven't worked out the numbers, but I'd expect a somewhat smaller advantage on the '020 and '030. I'd expect this sort of thing to be completely counter- productive on the '040. You might note that the original message I was replying to was talking about how bad the VAX 11/780 was because you could often beat the more complex instructions with a sequence of simpler ones. I'm merely pointing out that the VAX is not alone in this and other CPU's of the same time period are similar. That's what prompted the so called "RISC" techniques in the first place, and what led to even CISC CPUs such as the 68040 and 80486 and recent VAXes getting large performance improvements without increasing clock rates. - -- Bruce.Hoult@bbs.actrix.gen.nz Twisted pair: +64 4 477 2116 BIX: brucehoult Last Resort: PO Box 4145 Wellington, NZ "Cray's producing a 200 MIPS personal computer with 64MB RAM and a 1 GB hard disk that fits in your pocket!" "Great! Is it PC compatable?" +++++++++++++++++++++++++++ From: bayes@hplvec.LVLD.HP.COM (Scott Bayes) Date: 9 Jul 92 23:37:23 GMT Organization: Hewlett-Packard Co., Loveland, CO On a 68040 we came up with a very nearly optimum block move; that is not a guess, but a fact, as we had an HP 16500 analyzer on the bus, and could see that we were executing no-wait state moves of a large amount of data, maybe even a full scanline of the CRT (I can't remember) before overhead (non-data-move cycles) appeared. The overhead was required to move pointers, etc. Implementation was in assembly; I don't think there's any compiler written that could have come up with this one! Granted it was a very special case. The scanline is 1024 bytes wide, and the code had to scroll up or down by one character cell height. It is cache-line aligned (i.e. left edge of the framebuf starts on a cache-line boundary). The frame buffer RAM is 0 wait state for both read and write, and supports block (? I think that's the term) mode access: viz. accessing a single address loads the whole cache line surrounding the address. Copyback caching is obviously enabled. The bus is 32/32 at 80nsec. It ain't a Mac. The technique was to read a single longword at each successive 16 byte address in unrolled loop. This caused the whole cacheline to be loaded into D-cache, even though the processer thought it was only loading 1/4 line. we could not move16 it, because we were moving it to D0 all the time Move16 the from source to destination addresses in unrolled loop Loops were unrolled out to the point where the whole loop to handle a scanline plus the code to bump pointers and the outer loop just fit into I-cache. This eliminated most inner-loop overhead, 'cause the inner loop only executed once or twice as I remember. After the first scan line had been moved, all code was in I-cache and the rest of the scroll burned rubber. the code sort of looked like this: <unlock D-cache> move.l (a0),d0 adda.l #16,a0 skip to next cacheline <repeat above unrolled enough to cram I-cache> then reset aregs and <lock D-cache> move16 (a0)+,(a1)+ <unrolled> the move16s sourced from cache, so there were only writes on the bus in the second loop. The memory controller worked most efficiently when I/O direction didn't change, so it was best to do all cacheloads separately from the flush. On the analyzer we saw hundreds of inbound memory cycles at maximum bus speed, a short gap as the pointers were reset, then the same number of outbound cycles at max bus speed, then another short gap, and repeat for the next chunk. Throughput was about 45MB/sec over the whole routine compared to a theoretical max for the bus of 50MB/sec. But the weird things were that crazy first loop, and seeing the analyzer show very long strings of uninterrupted I/O at max rate. The algorithm served its purpose extremely well, and the hand-tuning was justified (no arguments entertained on that last point! and no flames on coding practices, either--mamagement demanded maximum performance from inadequate H/W) +++++++++++++++++++++++++++ From: bayes@hplvec.LVLD.HP.COM (Scott Bayes) Date: Wed, 15 Jul 1992 21:05:29 GMT Organization: Hewlett-Packard Co., Loveland, CO In general I agree with: > you have to work hard to do the job... :-) BTW, I'm not management, but I > have a definite leaning towards maximally efficient use of modest hardware. > Hardware costs are per-unit, but development costs can be amortized > nicely over the entire product life. If your instrument ends up costing > several hundred dollars than your competitor's RISC-y business, but is > equivalent in function and performance guess who wins? but the H/W Lab published figures that showed we could get better performance than we actually attained, without these "heroic" coding measures. Too late, we discovered: "NOT!". So the project was delayed while we tuned this up, which cost $. The same display was to be available on an '030 machine in the family. Because the performance was not acceptable, that configuration is not being sold, though we feel it would have been saleable, modulo performance. More $. Finally, as things got desperate, the H/W lab started proposing things like "we'll hack up a H/W mover for you for only a few $100000 cost in design, board layout, tooling, etc. Oh, and an extra $25/board." I'm used to squeezing performance out of H/W. At some point we reach a crossover of per unit cost vs sum(development cost, delay opportunity cost). I couldn't say exactly where we came out in the case cited. ScottB +++++++++++++++++++++++++++ From: bayes@hplvec.LVLD.HP.COM (Scott Bayes) Date: Wed, 15 Jul 1992 21:11:02 GMT Organization: Hewlett-Packard Co., Loveland, CO One last point on clever coding. Sales lifetimes of computing systems are decreasing. We have had quite a few systems lately whose sales lives are measured in months, rather than years. The break-even comes more and more onto the side of "develop quickly and cheaply" in that situation. As technology advances at more and more rapid rates in our field (at least, so I believe), the balance moves farther andf farther in that direction. The schedule pressure on my project was immense, and quality suffered elsewhere, when programmers were putting in 16 hour days for months: you get too worn down to do a good job. ScottB --------------------------- From: kishon-amir@CS.YALE.EDU (amir kishon) Subject: List Manager help sought Organization: Yale University Computer Science Dept., New Haven, CT 06520-2158 Date: Tue, 14 Jul 1992 02:36:55 GMT I would like to use the List Manager to implement a table of 3 columns. The only thing different about this table is that when I click on a cell I would like to highlight the whole row which this cell belongs to (rather than just highlighting the particular cell). For example: A1 A2 A3 B1 B2 B3 C1 C2 C3 if I click on either B1 B2 or B3 I would like B1 B2 and B3 to be highlighted. Unfortunately, the List Manager only highlights one cell given a mouse click. Any help would be much appreciated. - -Amir - -- Amir Kishon ARPA: kishon-amir@cs.yale.edu Yale University, Computer Science Dept. kishon-amir@yale.arpa P.O.Box 2158 Yale Station BITNET: kishon-amir@yalecs.bitnet New Haven, CT. 06520-2158 UUCP: decvax!yale!kishon-amir +++++++++++++++++++++++++++ From: dave@gergo.tamu.edu (Dave Martin) Date: 14 Jul 92 12:53:00 GMT Organization: Geochemical & Environmental Research Group, Texas A&M University In article <1992Jul14.023655.8607@cs.yale.edu>, kishon-amir@CS.YALE.EDU (amir kishon) writes... >I would like to use the List Manager to implement a table of 3 >columns. The only thing different about this table is that when I >click on a cell I would like to highlight the whole row which this >cell belongs to (rather than just highlighting the particular cell). >if I click on either B1 B2 or B3 I would like B1 B2 and B3 to be >highlighted. > >Unfortunately, the List Manager only highlights one cell given a mouse >click. Any help would be much appreciated. What you could do is -- each time there is a mousedown event and it is in the list area -- get the selected cell then do a LSetSelect(true,...) on the other cells in that row. That should work (I think). - - - Dave Martin - Geochemical & Environmental Research Group, Texas A&M - - DAVE@GERGA[GERGO,GERGI].TAMU.EDU - BROOKS@TAMVXOCN.BITNET - AOL:DBM - - - +++++++++++++++++++++++++++ From: Jerome Chan <yjc@po.cwru.edu> Organization: Alethea, The Twilight World! Date: Tue, 14 Jul 92 14:18:36 GMT In article <14JUL199206531047@gergo.tamu.edu> Dave Martin, dave@gergo.tamu.edu writes: >>I would like to use the List Manager to implement a table of 3 If I use ResEdit and copy another LDEF from another application and put it into mine, would I be able to use it if I pass the correct the resouce ID as theProc in LNew(...)? Would it respond as the built-in LDEF? - --NewBie Programmer At Work-- - --Day 2 and still going-- - --- Fading +++++++++++++++++++++++++++ From: haynes@mace.cc.purdue.edu (Carl W. Haynes III) Date: 14 Jul 92 17:22:48 GMT Organization: Purdue University In article <1992Jul14.141836.1165@usenet.ins.cwru.edu> yjc@po.cwru.edu (Jerome Chan) writes: > > If I use ResEdit and copy another LDEF from another application and put >it into mine, would I be able to use it if I pass the correct the resouce >ID as theProc in LNew(...)? Would it respond as the built-in LDEF? Only if you know the data format for the cells. For example in some of my LDEF's I just place a pointer or handle in the cell, the LDEF then knows what to do with the pointer or handle. An LDEF that displays icons may take just the icon id or the actual icon data itself, unless you know what to put in the cell, it is unlikely that you will be able to use another persons LDEF. Just as an aside, LDEF's are probably the easiest of all the xDEFs to write They are very straightforward. I know that the second edition of Macintosh Programming Secrets has an example you can look at. carl - -- Carl W. Haynes III Haynes Consulting Services || CWH3@aol.com PO Box 2715 || haynes@mace.cc.purdue.edu W. Lafayette, IN 47906 || voice: 317 463-6383 - ---------------------------------------------------------------------- Macintosh Programming & Consulting --- currently seeking contract work +++++++++++++++++++++++++++ From: stoodt@cis.umassd.edu (Michael Stoodt) Date: 14 Jul 92 20:17:45 GMT Organization: University of Massachusetts Dartmouth In <1992Jul14.023655.8607@cs.yale.edu> kishon-amir@CS.YALE.EDU (amir kishon) writes: >I would like to use the List Manager to implement a table of 3 >columns. The only thing different about this table is that when I >click on a cell I would like to highlight the whole row which this >cell belongs to (rather than just highlighting the particular cell). >Unfortunately, the List Manager only highlights one cell given a mouse >click. Any help would be much appreciated. Don't make it a table of three columns; make it a table of one column whose List DEFinition Draw() routine is smart enough to draw the three pieces of data at positions to make them line up in columns. +++++++++++++++++++++++++++ From: peter@cujo.curtin.edu.au (Peter N Lewis) Organization: NCRPDA, Curtin University Date: Thu, 16 Jul 1992 05:12:40 GMT In article <54005@mentor.cc.purdue.edu>, haynes@mace.cc.purdue.edu (Carl W. Haynes III) wrote: > > In article <1992Jul14.141836.1165@usenet.ins.cwru.edu> yjc@po.cwru.edu (Jerome Chan) writes: > > > > If I use ResEdit and copy another LDEF from another application and put > >it into mine, would I be able to use it if I pass the correct the resouce > >ID as theProc in LNew(...)? Would it respond as the built-in LDEF? > Just as an aside, LDEF's are probably the easiest of all the xDEFs to write > They are very straightforward. I know that the second edition of Macintosh > Programming Secrets has an example you can look at. This is very true, plus there are lots of sample codes around - send a note to stevej@ais.org (SteveJ) of TopSoft (makers of Free Software - but who haven't quite figured out their motto yet :) and he can put you on to our archive of source code, which has several ldefs buried in there somewhere. BTW, another aside, coping an LDEF out of another application would be a breach of copyright, probably even if you just use it for yourself, but definitely otherwise - certainly it wouldn't be nice to do it without the author's permision. Oh yeah, if you want a pascal source you could grab my source code for Talk (I think its still at sumex-aim.stanford.edu in the /info-mac/source/pascal directory, but they seem to be expiring files quickly these days...) which has an LDEF that does pretty small icons and everything :-) And if you ask me, you can even use it :-) Heck, you can use it even if you don't ask me, as long as its not a commercial program :) Have fun, Peter. _______________________________________________________________________ Peter N Lewis, NCRPDA, Curtin University peter@cujo.curtin.edu.au GPO Box U1987, Perth WA 6001, AUSTRALIA FAX: +61 9 367 8141 +++++++++++++++++++++++++++ From: oster@well.sf.ca.us (David Phillip Oster) Organization: Whole Earth 'Lectronic Link Date: Thu, 16 Jul 1992 06:14:24 GMT In article <14JUL199206531047@gergo.tamu.edu> dave@gergo.tamu.edu (Dave Martin) writes: _>In article <1992Jul14.023655.8607@cs.yale.edu>, kishon-amir@CS.YALE.EDU (amir kishon) writes... _>>I would like to use the List Manager to implement a table of 3 _>>columns. The only thing different about this table is that when I _>>click on a cell I would like to highlight the whole row which this _>>cell belongs to (rather than just highlighting the particular cell). _>>if I click on either B1 B2 or B3 I would like B1 B2 and B3 to be _>>highlighted. _>What you could do is -- each time there is a mousedown event and it is in the _>list area -- get the selected cell then do a LSetSelect(true,...) on the other _>cells in that row. That should work (I think). Unfortunately, if the user clicks and drags, the LSetSelect() routine won't be called until the user lets go of the mouse, so the user won't see the correct hiliting behavior. A better solution is a custom LDEF. These are really simple to write. You start be declaring a procedure in your application program like this: (This comes from my Address Book Plus, available at any Macintosh software store.) /* VectorLDef - map the names of named vectors to an ldef */ pascal void VectorLDef( Integer mesg, Boolean hilit, Rect *lrect, Cell lcell, Integer offset, Integer len, ListHandle list){ Vector v; NamedField nf; Ptr p; SignedByte state; switch(mesg){ case lDrawMsg: if( NIL == (v = (Vector) (**list).refCon) || NIL == (nf = (NamedField) TGet(v, lcell.v))){ return; } state = HSetLockState((Handle) nf); p = (Ptr) ((**nf).nameOffset + (Ptr) *nf); DrawNormalItem(lrect, Length(p), &p[1]); HSetState((Handle) nf, state); if(hilit){ HiliteRect(lrect); } break; case lHiliteMsg: HiliteRect(lrect); break; case lCloseMsg: DisposHandle((**list).listDefProc); break; } } the point is, the handler for the drawMsg just draws the three columns, in your case. HiliteRect() is a procedure of mine, like InvertRect(), but it does the right thing on color displays. (See Inside Mac Vol 5 for more info on hiliting in color.) To connect it, you use this: typedef void (*SubrPtr)(); /* a 68000 jump absolute instruction */ typedef struct{ short op; SubrPtr addr; }JmpBuf, *JmpBufPtr, **JmpBufHandle; #define JMPINSTRUCT 0x4EF9 /* InitOurLDef - set our LDef to a captive routine. */ void InitOurLDef(SubrPtr func, ListHandle list){ JmpBufHandle ourLDEF; ourLDEF = (JmpBufHandle) NewHandle(sizeof(JmpBuf)); (**ourLDEF).op = JMPINSTRUCT; (**ourLDEF).addr = func; (**list).listDefProc = (Handle) ourLDEF; } The advantage of this technique is that your LDEF procedure has full access to yor application's globals. Unlike a separately compiled LDEF resource. Since the LDEF gets called with the Cell as one of the parameters, you can store the data in any convenient way, and never put it into the ListHandle at all. This makes modifying the list run MUCH faster than keeping the data in the listhandle. The same technique can be used with and <arbitrary> code definition procedure, but you have to worry about setting up register A5 for WDEFs, and depending on what crazy third party software the users have, possibly CDEFs and MDEFs ttoo. --------------------------- End of C.S.M.P. Digest **********************